Common phrases and minimum-space text storage
نویسندگان
چکیده
منابع مشابه
Disambiguating Cue Phrases in Text and Speech
Cue phrases are linguistic expressions such as 'now' and 'welg tha t may explicitly mark the structure of a discourse. For example, while the cue phrase ' inczdcntally' may be used SENTENTIALLY as an adverbial, the DISCOUaSE use initiates a digression. In [8], we noted the ambiguity of cue phrases with respect to discourse and sentential usage and proposed an intonational model for their disamb...
متن کاملExtraction of Significant Phrases from Text
Prospective readers can quickly determine whether a document is relevant to their information need if the significant phrases (or keyphrases) in this document are provided. Although keyphrases are useful, not many documents have keyphrases assigned to them, and manually assigning keyphrases to existing documents is costly. Therefore, there is a need for automatic keyphrase extraction. This pape...
متن کاملClassifying Cue Phrases in Text a
Cue phrases may be used in a dkozsrse sense to explicitly signal discourse structure, but also in a sepztent&l sense to convey semantic rather than structural information. This paper explores the use of machine learning for classifying cue phrases as discourse or sentential. Two machine learning programs (CGRENDEL and C4.5) are used to induce classification rules from sets of pre-classified cue...
متن کاملStatistical Phrases in Automated Text Categorization
In this work we investigate the usefulness of n-grams for document indexing in text categorization (TC). We call n-gram a set tk of n word stems, and we say that tk occurs in a document dj when a sequence of words appears in dj that, after stop word removal and stemming, consists exactly of the n stems in tk, in some order. Previous researches have investigated the use of n-grams (or some varia...
متن کاملDetecting multiword phrases in mathematical text corpora
We present an approach for detecting multiword phrases in mathematical text corpora. The method used is based on characteristic features of mathematical terminology. It makes use of a software tool named Lingo which allows to identify words by means of previously defined dictionaries for specific word classes as adjectives, personal names or nouns. The detection of multiword groups is done algo...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Communications of the ACM
سال: 1973
ISSN: 0001-0782,1557-7317
DOI: 10.1145/361972.361982